On Efficient Approximate Queries over Machine Learning Models
نویسندگان
چکیده
The question of answering queries over ML predictions has been gaining attention in the database community. This is challenging because finding high quality answers by invoking an oracle such as a human expert or expensive deep neural network model on every single item DB and then applying query, can be prohibitive. We develop novel unified framework for approximate query leveraging proxy to minimize usage both Precision-Target (PT) Recall-Target (RT) queries. Our uses judicious combination data samples cheap objects. It relies two assumptions. Under P roxy Q uality assumption, we algorithms: PQA that efficiently finds with probability no calls, PQE, heuristic extension achieves empirically good performance small number calls. Alternatively, under C ore S et losure CSC returns minimal usage, CSE, which extends it more general settings. extensive experiments five real-world datasets types, PT RT, demonstrate our algorithms outperform state-of-the-art achieve result provable statistical guarantees.
منابع مشابه
Dust source mapping using satellite imagery and machine learning models
Predicting dust sources area and determining the affecting factors is necessary in order to prioritize management and practice deal with desertification due to wind erosion in arid areas. Therefore, this study aimed to evaluate the application of three machine learning models (including generalized linear model, artificial neural network, random forest) to predict the vulnerability of dust cent...
متن کاملMachine Learning Models for Housing Prices Forecasting using Registration Data
This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...
متن کاملEfficient Temporal Keyword Queries over Versioned Text
Modern text analytics applications operate on large volumes of temporal text data such as Web archives, newspaper archives, blogs, wikis, and micro-blogs. In these settings, searching and mining needs to use constraints on the time dimension in addition to keyword constraints. A natural approach to address such queries is an inverted index whose entries are enriched with valid-time intervals. I...
متن کاملEfficient Conjunctive Queries over Semi-Expressive OntologiesUsed on Triple Stores
The Semantic Web has become an important movement in the internet during the past years. A general issue in this context is reasoning on large ontologies. Traditional reasoning strategies rely on efficient main memory data structures. As the growing amount of assertional statements and data is more and more reaching the capacity of standard user computers, there is a need for new reasoning stra...
متن کاملEmpirical models based on machine learning techniques for determining approximate reliability expressions
In this paper two machine learning algorithms, Decision Trees (DT) and Hamming Clustering (HC), are compared in building approximated Reliability Expression (RE). The main idea is to employ a classification technique, trained on a restricted subset of data, to produce an estimate of the RE, which provides reasonably accurate values of the reliability. The experiments show that although both met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2022
ISSN: ['2150-8097']
DOI: https://doi.org/10.14778/3574245.3574273